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Abstract. We analyze a sublinear RA£SFA (Randomized Algorithm for Sparse Fourier Analysis) that finds 
a near-optimal 73-term Sparse Representation R for a given discrete signal S of length TV, in time and space 
poly(B,log(N)), following the approach given in [3]. Its time cost poly(log(N)) should be compared with the 
superlinear f2(TVlogTV) time requirement of the Fast Fourier Transform (FFT). A straightforward implementation 
of the RA^SFA, as presented in the theoretical paper turns out to be very slow in practice. Our main result 
is a greatly improved and practical RA^SFA. We introduce several new ideas and techniques that speed up the 
algorithm. Both rigorous and heuristic arguments for parameter choices are presented. Our RA/TSFA constructs, 
with probability at least 1 — <5, a near-optimal B-term representation R in time poly(B) log(N) log(l/<5)/e 2 log(M) 
such that ||5 - -RH2 < (1 + e)\\S — -R op t|||. Furthermore, this RA^SFA implementation already beats the FFTW for 
not unreasonably large TV. We extend the algorithm to higher dimensional cases both theoretically and numerically. 
The crossover point lies at TV ~ 70000 in one dimension, and at TV ~ 900 for data on a TV X TV grid in two dimensions 
for small B signals where there is noise. 
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1. Introduction. We shall be concerned with discrete signals S = (S(Q), . . . , S(N — 1)) £ C N 
and their Fourier transforms S = 0(0), . . . , S(N-l)), defined by S(lu) = ^= Y^ 1 S(t)e~ 2mt / N . 
In terms of the Fourier basis functions (/> u (t) — -2= e 2m ^ t / N ; 5" can be written as S = J2^Iq S(w)c/> u (t); 
this is the (discrete) Fourier representation of 5*. 

In many situations, a few large Fourier coefficients already capture the major time-invariant 
wave-like information of the signal and very small Fourier coefficients can thus be discarded. The 
problem of finding the (hopefully few) largest Fourier coefficients of a signal that describe most 
of the signal trends, is a fundamental task in Fourier Analysis. Techniques to solve this problem 
are very useful in data compression, feature extraction, finding approximating periods and other 
data mining tasks 0, as well as in situations where multiple scales exist in the domain (as in e.g. 
materials science), and the solutions have sparse modes in the frequency domain. 

Let S be a signal that is known to have a sparse B-term Fourier representation with B <C TV, 

i.e., 

S(t) = -^—(a^ 2 ™^/ 1 * + ... + a B e l2 ™ Bt / N ), (1.1) 

V TV 



and let us assume that it is possible to evaluate S, at arbitrary t, at cost O(l) for every evaluation. 

To identify the parameters a\, . . . , <2b, u)\, . . . , u>b, one can use the Fast Fourier Transform 
(FFT). Starting from the V point-evaluations 5(0), . . . , S(N- 1), the FFT computes all the Fourier 
coefficients; one can then take the largest B coefficients and the corresponding modes. The time 
cost for this procedure is fi(Vlog V); this can become very expensive if V is huge. (Note that all 
logarithms in this paper are with base 2, unless stated otherwise.) The problem becomes worse in 
higher dimensions. If one uses grids of size V in each of d dimensions, the total number of points is 
N d and the FFT procedure takes fl(dN d log N) time. It follows that identifying a sparse number 
of modes and amplitudes is expensive for even fairly modest V. Our goal in this paper is to discuss 
much faster algorithms that can identify the coefficients oj, . . . , clb and the modes u>\, . . . , ujg in 
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equation These algorithms will not use all the samples 5(0), . . . , S(N — 1), but only a very 

sparse subset of them. 

In fact, we need not restrict ourselves to signals that are exactly equal to a i?-term represen- 
tation. Let us denote the optimal B-term Fourier representation of a signal S by R^ pt (S); it is 
simply a truncated version of the Fourier representation of S, retaining only the B largest coef- 
ficients. We are then interested in identifying (or finding a close approximation to) R^ pt (S) via 
a fast algorithm. The papers |S] [E] 01 provide such algorithms; all compute a (near-)optimal B- 
term Fourier representation R in time and space poly(B, log(l/5), log N, logM, 1/e), such that 
US' — i?|| 2 < (1 + e )\\S — Ro P t(S)\\2> with success probability at least 1 — 5, where M is an a priori 
given upper bound on ||S||2- The algorithms in these papers share the property that they need 
only some random subsets of the input rather than all the data; they differ in many details: the 
different papers assume different conditions on N, (for example, N is assumed to be a power of 2 
or a small prime number in 0; N may be arbitrary but is preferably a prime in 0); the algorithms 
also use different schemes to locate the significant modes. (Here we say a mode u> is significant if 
for some pre-set 77, ^(w)) 2 > /7||5|| 2 .) Mansour and Sahar 7 implemented a similar algorithm for 
Fourier analysis on the set Z£ , where our algorithm is for Fourier analysis on Zjv- 

The results of P] can be extended to more general representations, with respect to a particular 
basis or a family of bases; examples are wavelet bases, wavelet packets or Fourier bases. We shall 
use the acronym RA£STA (Randomized Algorithm for Sparse Transform Analysis) for this family 
of algorithms. We here restrict ourselves to the Fourier case and thus RAffiFA. 

For a wide range of applications, the speed potential suggested by the sublinear cost of these 
algorithms is of great importance. In this paper, we concentrate on the approach proposed in 
Note that [3j gives a theoretical rather than a practical analysis in the sense that it does not discuss 
parameter settings; it gives few hints about the order of the polynomial in B and log./V; in fact, a 
straightforward implementation of RA^SFA following the set-up of [3j turns out to be too slow to 
be practical, so that none of the direct implementation work was published. In addition, did not 
discuss extensions to higher dimensions, where the pay-off of RA^SFA versus the FFT is expected 
to be larger. 

Our main result in this paper is a version of RA^SFA that addresses these problems. We give 
theoretical and heuristic arguments for the setting of parameters; we introduce some new ideas that 
produce a practical RA^SFA implementation. Our new version can outperform the FFTW when 
N is around 70, 000 and B is small. 

A Motivating Example. RA^SFA is an exciting replacement for the FFT to solve multiscale 
models. Typically, one wants to simulate a multiscale model in several dimensions with both 
a microscopic and a macroscopic description. The solution to the model has rapidly oscillating 
coefficients with period proportional to a small parameter e. For examples of multiscale problems 
of size N that are dominated by the behavior of B <C N Fourier components, see e.g JIJ. In a 
traditional (pseudo-)spectral method, one computes the spatial derivatives by the FFT and Inverse 
FFT at each time iteration; consequently the time to find the Fourier representation of a signal is 
the determining factor in the overall time of simulation. In multiscale problems, where only a small 
number of Fourier modes contribute to the energy of an initial condition and coefficient functions, 
we expect that RA£SFA will significantly speed up the calculation for large N. In fact, a preliminary 
study has shown [§] that for some transport and diffusion equations with multiple scales, using only 
significant frequencies to approximate intermediate solutions does not substantially degrade the 
quality of the approximate final solution to the multiscale problem. By using the most significant 
frequencies and RA£SFA instead of all frequencies and the FFT, we could replace a superlinear 
algorithm by a poly-log (polynomial in the logarithm) algorithm. The corresponding decrease of 
the running time would make it possible to handle a larger number of grid points in high dimensions. 
We shall present detailed applications of this algorithm in multiscale problems in |12j . 

Notation and Terminology. For any two frequencies u>\, u>2, where u>i =^ u>2, we say that 
is bigger than S(lu 2 ) if \§(wi)\ > \S(lo 2 )\. The squared norm ||S|| 2 = JX" 1 \S(t)\ 2 of S is 
also called the energy of S; we shall refer to ^(w)! 2 as the energy of the Fourier coefficient S(uj). 
Similarly, the energy of a set of Fourier coefficients is the sum of the squares of their magnitudes. 
We shall use only the £ 2 -norm in this paper; for convenience, we therefore drop the subscript from 
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now on, and denote ||F||| by ||-F|| 2 for any signal F. 

We denote the convolution by F * G, (F * G)(t) = £ s F(s)G(t - s). It follows that F~7~G = 
y/NFG. We denote by \t the signal that equals 1 on a set T and zero elsewhere. The index to 
Xt may be either time or frequency; this is made clear from context. For more background on 
Fourier analysis, see (TJ. The support supp(F) of a vector F is the set of t for which F(t) 7^ 0. 
A signal is 98% pure if there exists a frequency ui and some signal p, such that S = a<f> u + p and 
|a| 2 > 0.98||S|| 2 . 

RA^SFA is a randomized algorithm. By this, we do not mean the signal is randomly chosen 
from some kind of distribution, with our timing and memory requirement estimates holding with 
respect to this distribution; on the contrary, the signal, once given to us, is fixed. The randomness 
lies in the algorithm. After random sampling, certain operations are repeated many times, on 
different subsets of samples, and averages and medians of the results are computed. We set in 
advance a desired probability of success 1 — 8, where 8 > can be arbitrarily small. Then the claim 
is that for each arbitrary input S, the algorithm succeeds with probability 1 — 8, i.e., gives a B-term 
estimate R such that \\S — R\\ 2 < (1 + e)\\S — i?^, t || 2 . For given e, 8, numerical experiments show 
that the algorithm may take 0{B 2 log A) time and space. 

Organization. The chapters are organized as follows. Section 2 shows the testbed and numeri- 
cal experiments about the comparison of our RA^SFA and the FFTW. In Section 3, we introduce all 
the new techniques and ideas of RA^SFA (different from [3]) and its extension to multi-dimensions. 

2. Testbed and Numerical Results of RAfSFA. In this section, we present numerical 
results of RAfSFA. We begin in Section |2~T1 with comparing the running time of RAffiFA and the 
FFTW for some one dimensional test examples. In Section l2~2*l the performances of two dimensional 
RA^SFA and the FFTW for some test signals are shown. 

The randomness of the algorithm implies that the performance differs each time for the same 
group of parameters. Hence, we give the average data, bar and quartile graph based on 100 runs 
as well as the fastest data among these experiments. The popular software FFTW [2] version 2.1.5 
is used to determine the timing of the Fast Fourier Transform for the same data. 

The test signals are either superpositions of B <C N modes in the frequency domain, that 
is, S = 'Yli^ = iCj4> ulj , contaminated with Gaussian white noise, or signals for which the Fourier 
coefficients exhibit rapid decay, so that a B-vaode approximation with B <C N will already be very 
accurate. Different choices of the Wj were checked; these did not influence the whole execution time. 
These choices included cases where some frequencies were close; note that this is the "hard" case 
for most estimation algorithms. For RA^SFA, which contains random scrambling operations (that 
are later described), the distance between the modes does not matter if N is prime. If N is not 
prime, then gcd{u>i — u>2, N) cannot decrease by the scrambling operation, so that different (ui, u>2) 
pairs may (in theory) lead to different performances; in practice, this doesn't seem to matter. In 
all these situations, RA^SFA reliably estimates the size and locations of the few largest coefficients. 
We also set other parameters as follows: accuracy factor e = 1 2 1 1 -<S r ) | , failure probability 8 = 0.05. 

The parameter choices in the algorithm are quite tricky. The theoretical bounds given in 
do not work well in practice; instead much smaller parameters and heuristic settings work more 
efficiently. 

All the experiments were run on an AMD Athlon(TM) XP1900+ machine with Cache size 
256KB, total memory 512 MB, Linux kernel version 2.4.20-20.9 and compiler gec version 3.2.2. 

2.1. Numerical Results in one dimension. The first implementation results of RA^SFA 
were not published; the program was basically a proof of concept, not optimized. With the choices 
and parameters described in [2), it was extremely slow and thus not practical for real- world appli- 
cations. The implementation we present here runs several order of magnitude faster; this involves 
introducing many adjustments and ideas to the algorithm of [5]. (See Section 3 for details.) 

The goal of this paper is to check the possibility to replace the FFT with RA^SFA for sparse 
and long signals. Therefore, we focus on comparing the performance of RAffiFA and FFTW in the 
following subsections. 

2.1.1. Experiments for an Eight-mode Representation. We begin with the experiments 
for recovering a signal consisting of eight modes (with and without noise). In the noisy signal 
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case, the noise is a Gaussian white noise with signal-to-noise ratio (SNR, defined as 101og 10 j^t) 
approximately 5dB. The coefficients are randomly taken from the interval [1, 10] and the significant 
modes from [0, N — 1]. 

Two kinds of running time for each algorithm are provided. One is the total running time and 
another is the running time excluding the sampling time. As we know, the FFT takes Q(N) to 
compute all signal values. On the other hand, our algorithm doesn't need all the sample values. 
All our conclusions are based on the time excluding the sampling. However, we still list the running 
time including sampling time as well because of the existence of various forms of data in practice. 
For example, in pseudospectral applications, the data need to be computed from a B-superposition, 
which may take 0(B) per sample. It is possible to sample more quickly, which is addressed in [J. 
On the other hand, if the data is already stored in a file or a disk, we simply get them without 
any computation. In all these cases, we assume the data is either already in memory or available 
through computation. Thus we don't need to go through every data, which would take time O(N). 

Table |2~T1 provides a comparison of the running times of the FFTW and RA^SFA for eight-mode 
clean and noisy signals. In the beginning when N is small, the FFTW is almost instantaneous. As 
the signal length TV increases, its time grows superlinearly. On the contrary, RA^SFA takes longer 
time in smaller N cases; however the time cost remains almost constant regardless of the signal 
length. In addition, the benchmark FFTW software fails to process more than 10 8 data because it 
runs out of the memory space. In contrast, RA^SFA has no difficulty at all since it does not need 
all the data. A simple interpolation from the entries in Table 12.11 predicts that RA£SFA beats the 
FFTW when N > 15, 200 for eight-mode signals, all the more convincingly when N is larger. If we 
compare the time including sample computation, the cross-over point would be N = 70, 000. The 
table also shows the linear relationship between the time cost and the logarithm of the length N. 



Length 


Time of 


Time of 


Time of RAffiFA 


Time of FFTW 


N 


RA£SFA 


FFTW 


(excluding sampling) 


(excluding sampling) 




clean 


noisy 




clean 


noisy 




10 3 


0.22 


0.25 





0.01 


0.02 





10 4 


0.25 


0.29 


0.04 


0.03 


0.04 


0.01 


10 5 


0.32 


0.34 


0.46 


0.05 


0.05 


0.17 


10 B 


0.37 


0.41 


5.01 


0.07 


0.08 


2.23 


10 7 


0.44 


0.48 


54.57 


0.10 


0.11 


26.24 



Table 2.1 

Time Comparison between RAISFA and FFTW (B=8) based on 100 runs. "Clean" means that the test signal 
is pure. "Noisy" means the signal is contaminated with noise of SNR = 5dB. "Excluding Sampling" column lists 
the running time without precomputation of sample values. 

As can be expected from a randomized algorithm, RAISFA has a different performance in each 
run. Figure 12.21 illustrates the spread of the execution time (including sampling) for pure signals 
over 100 runs. 

2.1.2. Experiments with Different Levels of Noise. In the experiments above, we com- 
pared the performance of clean and slightly noisy signals. Here, we shall push the noise level much 
higher, keeping N and B fixed to illustrate the effect of noise. Also, instead of allowing the algo- 
rithm to run for poly(B, log N, 1/e, log(l/5)) iterations, we set a smaller fixed upper bound (so that 
the success probability is no longer 1 — S). When noise is present, it influences the success proba- 
bility with which modes with small amplitude are detected. To explore this, we ran an experiment 
with only a single mode; we kept the amplitude of the mode constant and increased the noise. 
Figure EU (left) shows the success probability of the detection of the single mode by the algorithm 
(estimated by running 100 trials each time and recording the number that were successful) for three 
different settings of the maximum number of iterations. 

The dependence of the running time on the SNR in the case of detection of a single mode is 
illustrated in Figure EH (right), where we show the results of the average over 100 runs for every 
data point, with only a very loose a priori restriction on the running time (<1000 iterations); only 
parameter settings with over 50% success probability were taken into account. 
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Fig. 2.2. Comparing the total running times of 8-mode RAISFA for 100 different runs of the randomized 
algorithm. Left: mean and variance as a function of N ; right: median, quartiles and total spread of the runs as a 
function of N , B = 8 




Fig. 2.3. Experiments for signal S = <j>o + noise with length N = 10, 009. Compare the success rate and running 
time of RAISFA when the total number of iterations is bounded by 100, 500 and 1000 (respectively), based on 100 
different runs of the randomized algorithm in each case. Left: success probability of RAISFA as SNR decreases; 
right: running time of RAISFA as SNR decreases (we only show the running time when success probability is 
greater than 50%). Note that the abscissa show —SNR each time, meaning that the i 2 -norm of the noise is much 
larger than that of the signal in the regimes illustrated here; for instance, SNR = —60dB means that the I 2 -norm 
of the noise equals 1,000 X £ 2 norm of the signal. 



This experiment indicates that it is possible to detect modes that are significantly weaker than 
the noise, within limits, of course. If the amplitude of the signal is too weak, then trying to detect 
it may waste many resources. In practice we shall put our cut-off on the amplitude at about one 
sixth of the noise level, i.e., at cr/6; this can of course be adjusted depending on whether one wishes 
fast speed or not. 

Although SNR is the standard characterization of noise intensity, it is not clear that it is the 
parameter that matters most for our algorithm. We therefore also ran an experiment in which we 
compare the results for two different values of N: 10,009 (as in the Figures above) and 100,003, 
respectively. The second value of ./V is about 10 times larger than the first; for the same choices of 
a and c (the amplitude of the single mode), the SNR for the second TV is smaller by lOdB. Table 
12.41 comparing the performance for these two values of N and several choices of a, shows that the 
value of a itself rather than SNR governs the running time and success probability. 

2.1.3. Experiments with Different Numbers of Modes. The crossover points for N are 
different for signals with different B; the number of modes has an important influence on the running 
time. To investigate this, we experimented with fixed N (we took a prime number N = 2, 097, 169 
(a prime number) for RA£SFA and N = 2 21 = 2, 097, 152 for FFTW) but varying B. In all cases, 
we take S to be a superposition of exactly B modes, i.e., S(t) = Ylf=i c i4>uii for some B. Table |2~B1 
compares the running time for different B using the FFTW and RA£SFA. For small B, RA£SFA 
takes less time because N is so large. The execution time for the FFT can be taken to include 
the time for evaluation of all the samples (which increases linearly in B) or not (in which case the 
execution time is constant to B). In both cases, the FFTW overtakes RAISFA as B increases; the 
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N t = 10, 009 


N 2 = 100, 003 


a 


success piouduiuiy 


time 


<? AT F? 


success probability 


tinic 


Q AT F? 


9 


100% 


0.11 


-46.02 


100% 


0.19 


-56.02 


2.5 


93% 


0.32 


-47.96 


77% 


0.55 


-57.96 


3 


49% 


0.38 


-49.54 


27% 


0.61 


-59.54 


3.5 


21% 


0.45 


-50.88 


10% 


0.38 


-60.88 


4 


13% 


0.38 


-52.04 


1% 


0.37 


-62.04 



Table 2.4 



Exploring the dependence on a versus SNR of the influence of the noise on processing the signal S = <j>o + noise, 
where the noise is gaussian N(0,cr). For two different values of N, Ni = 10,009 and N2 = 100,003 ~ lOJVi, 
respectively, and a range of values for a, we determined the success probability within 100 runs, and the average 
running time for successful runs. In both cases we see a clear transition as a increases; the location of the transition 
(between 2.5 and 4 for Ni, between 2 and 3.5 for N2) shifts slightly with N , but it is nevertheless clear that a is 
a better parameter to track than SNR: in fact, the largest choice for a, a = 4, still has lower SNR in the case 
N = N\ than the smallest choice, a = 2, for N = N2, yet the success probability and running time are much worse. 



execution time of the FFTW is constant or linear in the number of modes B (depending on whether 
the evaluation of samples is included), while that of RAffiFA is polynomial of higher order. For 
N = 2, 097, 169, the FFTW is faster than RA^SFA when B > 33. By regression techniques on the 
experimental data, one empirically finds that the order of B in RA£SFA is quadratic. This is the 
main disadvantage of RA£SFA. (Although this nonlinearity in B was expected by the authors of 
[3] , the observation that it played such an important role even for modest B was the motivation for 
Gilbert, Muthukrishnan and Strauss to construct in 0| a different version of RA^SFA that is linear 
in B for all N.) Hence, RA^SFA is most useful for a long signal with a small number of modes. 



Number of modes 
B 


Time of 
RA^SFA 


Time of 
FFTW 


Time of RAffiFA 
(exclude sampling ) 


Time of FFTW 
(exclude sampling) 


2 


0.05 


7.49 


0.03 


5.46 


4 


0.14 


9.38 


0.05 


5.46 


8 


0.35 


13.22 


0.07 


5.46 


16 


2.48 


20.92 


0.83 


5.46 


32 


15.53 


36.28 


4.13 


5.46 


64 


107.55 


67.16 


39.55 


5.46 



Table 2.5 

Time Comparison between RAZSFA and FFTW for Different B when N ps 2,097, 169 



2.1.4. Experiments with Signals that have infinitely many modes with rapid decay 
in frequency. For our final batch of one-dimensional experiments, we ran the algorithm on the 
signal S — 1/(1.5 + cos2-7rf) + noise. In continuous time, the clean signal has infinitely many 
modes with amplitudes that decay exponentially as the frequency of the mode increases. We 
ran the experiment with a white Gaussian noise once with SNR —20dB and a second time with 
SNR = —8dB, with N = 1000. The threshold for the amplitudes of modes we wished to find was 
adjusted to the noise level in both cases. 

The results are shown in Figure (S NR = -20dB)and Figure E2I {SNR = -8dB), respec- 
tively. For SNR = ~20dB, the Fourier coefficients obtained by FFTW are all very close to the 
"noise floor", i.e., they lie in a band of amplitude close to the value of a. For SNR = —8dB, a 
is smaller {a = 2.6), and we find the "noise floor" in the FFTW computation at this lower level. 
The three largest modes of the signal have amplitudes significantly higher than this a, and FFTW 
finds them with reasonable accuracy. In contrast, RA£SFA (shown on the left in both figures; only 
1 run is shown) hits all the coefficients exceeding a "on the nose" , in both cases; it also finds all the 
central 15 modes exactly in the SNR = —8dB case, even if they have values significantly smaller 
than a. This experiment illustrates the great robustness of RA£SFA to noise and its ability to 
detect harmonic components with smaller energy than the white noise, already seen in 12. 1.21 
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Fig. 2.6. For signal S = 1/(1.5 + cos 2nt) + noise with SNR = —20dB. Compare the approximation effect 
by RAISFA and FFTW. Left: approximation of the significant coefficients by RAISFA; the relative approximation 
error is 0.74%; right: approximation of the significant coefficients by FFTW. 



Recover coefficients by Raisfa (SNR=-8dB, a=2.6) 
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Fig. 2.7. For signal S = 1/(1.5 + cos2-7rt) + noise with SNR = —8dB. Compare the approximation effect 
by RAtSFA and FFTW. Left: approximation of the significant coefficients by RAISFA; the relative approximation 
error is 0A%;right: approximation of the significant coefficients by FFTW. (this is for the one run illustrated. In 
other runs, it makes similarly one or two mistakes, not necessarily at the same modes.) 



2.2. Numerical Results in Two Dimensions. The number of grid points depends expo- 
nentially on the dimension. To achieve reasonable accuracy, a minimum N is required in each 
dimension; however, when d > 1, the FFTW has great difficulty in handling the corresponding N d 
points for even modest N. RAISFA does not have this problem. 

2.2.1. Experiments for Eight-mode Signals in Two Dimensions. We take the signal 
S = J2k=i c k4>uj x k 4>u> y fc i where B = 8, e = 10 2 1 1 1 1 , S = 0.05. The parameter N is the number of 
grid points in each dimension, random complex constants Ck with real and imaginary parts in [1, 10], 
and uj Xi k and to Vt k are random integers from 0, . . . , N — 1. As Table 12.81 shows, two dimensional 
RAISFA surpasses two dimensional FFTW when N > 1500. In particular, when N = 5000 and 
the computation for samples is not included, the FFTW takes 21 seconds and RAISFA only less 
than 5 second. When we include the sampling time, the crossover point becomes N = 900. The 

crossover point for N is 70000 for d = 1, and 900 for d = 2; if we conjecture that the crossover N 

i 

for 2-mode in d dimensions is given by c^n^ , then this leads us to guess that the crossover N for 
d = 3 may be close to 210. 

2.2.2. Experiments for Signals with Different Number of Modes B. As in one dimen- 
sion, the number of modes B is the bottleneck for applying RAISFA freely to signals that are not 
so sparse. Suppose the signal is of the form Sit) = J2k=i c k<Puj !C fc ^ y fc) with N = 3001 for RA£SFA 
and 3000 for FFTW. Table |2~§I illustrates the relationship between running time and the number 
of modes B. Time increases depends polynomially on the number of terms B. When N = 3001, 
the crossover points for the FFTW to surpass RAISFA are at B = 10 and B = 17 respectively, for 



8 



J. ZOU, A. GILBERT, M. STRAUSS, I. DAUBECHIES 



T ,pn o'tn 


Time of 


1 i T~n p nf 

_1_ 11I1C \JL 


Time of RAffiFA 


Time nf FFTW 

_____ LLLl\_, Ul -L J- 1 VV 


N 


RA£SFA 


FFTW 
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3.64 


U.Uo 


0.88 


1.05 


U.U4 


1UUU 


4.11 


4.54 


A 87 


1.04 


1.25 


u.zu 


2000 


4.76 


4.91 


20.86 


1.31 


1.44 


2.12 


3000 


4.55 


5.37 


47.73 


1.33 


1.70 


5.62 


4000 


5.41 


5.59 


85.89 


1.41 


1.51 


10.74 


5000 


6.03 


6.20 


138.27 


1.56 


1.66 


20.98 



Table 2.8 

Time Comparison between RAISFA and FFTW (B=8) based on 100 runs. "Clean" means that the test signal 
is pure. "Noisy" means the signal is contaminated with noise of SNR = —AdB. "Excluding Sampling" column lists 
the running time without including precomputation of sample values. 



including and excluding sample computation cases. This implies the influence of B on the execution 
time is far from negligible. 



Number of modes 
B 


Time of 
RAISFA 


Time of 
FFTW 


Time of RAffiFA 
(exclude sampling ) 


Time of FFTW 
(exclude sampling) 


2 


0.15 


16.45 


0.08 


5.64 


4 


0.52 


26.81 


0.14 


5.64 


8 


4.55 


47.73 


1.343 


5.64 


12 


19.37 


68.47 


8.82 


5.64 


16 


48.69 


89.13 


9.13 


5.64 


20 


114.80 


109.88 


22.75 


5.64 



Table 2.9 

Time Comparison between RAISFA and the FFTW for signals with different B in 2 dimensions when N = 3000 



3. Theoretical Analysis and Techniques of RA£SFA. We hope the numerical results have 
whetted the reader's appetite for a more detailed explanation of the algorithm. Before explaining 
the structure of RA£SFA as implemented by us, we review the basic idea of the algorithm. Given a 
signal consisting of several frequency modes with different amplitudes, we could split it into several 
pieces that have fewer modes. If one such piece had only a single mode, then it would be fairly 
easy to identify this mode, and then to approximately find its amplitude. If the piece were not 
uni-modal, we could, by repeating the splitting, eventually get uni-modular pieces. In order to 
compute the amplitudes, we need to "estimate coefficients." To verify the location of the modes 
in the frequency domain and concentrate on the most significant part of the energy, we use "group 
testing." An estimation that recurs over and over again in this testing is the "evaluation of norms." 
The first splitting of the signal is done in the "isolation" step. 

The different steps are carried out on many different variants of the signals, each obtained 
by a random translation in the frequency domain (corresponding to a modulation and the inverse 
dilation in the time domain). Because the signal is sparse in the frequency domain, the different 
modes are highly likely to be well separated after these random operations, facilitating isolation of 
individual modes. 

The main skeleton of the algorithm was already given in pjj; in our discussion here, we introduce 
new ideas and give the corresponding theoretical analysis. We also explain how to set parameters 
that are either not mentioned or loose in |3J- In Section l3~Tl the total scheme of RAISFA is given. 
In Section EOl we show the theoretical basis to choose parameters for estimating coefficients, and 
introduce some techniques to speed up the algorithm. In Section T3.3I we set the parameters for 
norm estimation. Section 13.41 presents the heuristic rules to pick the filter width for the isolation 
procedure. This is one of the key factors determining the speed. A new filter is proposed for Group 
Testing in Section [3.51 which works more efficiently. Section f3 . 61 discusses how to evaluate a random 
sample from a signal. Finally, we discuss the extension to higher dimensions in Section 13.71 
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3.1. Set-up of RA^SFA. The following theorem is the main result of 0. 

Theorem 3.1. Let an accuracy factor e, a failure probability 6, and a sparsity target B S 
N, B < N be given. Then for an arbitrary signal S of length N , RAiSFA will find, at a cost in 
time and space of order poly (BAog(N), i,log(j-)) and with probability exceeding 1 — S, a B—term 
approximation R to S, so that \\S — R\\ 2 < (1 + e)\\S — -R^^) Hi- 
lt is especially striking that the near-optimal representation R can be built in sublinear time i.e., 
poly (log N) instead of the 0(N log N) time requirement of the FFT. RA£SFA's speed will surpass 
the FFT as long as the length of the signal is sufficiently large. In particular, if S = Rf pt (S) (that 

is, S(u>) vanishes for all but B values of ui), then \\S — R\\ 2 = 0, i.e., RA£SFA constructs S without 
any error, at least in theory; in practice this means the error is limited by accuracy issues. 

The main procedure is a Greedy Pursuit with the following steps: 

Algorithm 3.2. Total Scheme 
Input: signal S, the number of nonzero modes B or its upper bound, accuracy factor e, success 
probability 1 — 6, an upper bound of the signal energy M , the standard deviation of the white 
Gaussian noise a , a ratio l for relative precision. 

1. Initialize the representation signal R to 0, set the maximum number of iterations T = 
B\og{N)\og{5)/e 2 , 

2. Test whether \\S — R\\ 2 < t||i?|| 2 . If yes, return the representation signal R and the whole 
algorithm ends; else go to step 3. 

3. Locate Fourier Modes ui for the signal S — R by the isolation and group test procedures 
below. 

.{ . Estimate Fourier Coefficients at oj: (S — R)(lo). 

5. If the total number of iterations is less than T, go to 2; else return the representation R. 

The test at stage 2, which is not in 3 , can allow us to end early. The criterion \\S— R\\ 2 < i||-R|| 2 , 
where l is a small number chosen heuristically, is suitable when one expects that S is sparse, up to 
a small energy contribution. (Note that step 2 does not use the exact value of \\S — R\\ 2 , which is 
not known; we use a procedure called norm estimation (see below) to give a rough estimate; this is 
good enough for the stop criterion. Other criteria could be substituted when appropriate.) 

In practice, we would not know how many modes a signal has. In fact, the algorithm doesn't 
really need to know B: it can just proceed until the residual energy is estimated to be below 
threshold. (The value of B is used only to set the maximum number of iterations, and the width of 
a filter in the isolation procedures below. For the maximum number T, a loose upper bound on B 
suffices; the isolation filter width depends only very weakly on B.) If either the residual energy or 
the threshold is large, the program would continue. Note that for each iteration of the algorithm, 
we take new random samples from the signal S. 

3.2. Estimate Individual Fourier Coefficients. The original RA£SFA only shows the va- 
lidity of estimating coefficients without mentioning parameter settings. Here we introduce a new 
technique to achieve better and faster estimation; in the process, we give another proof of Lemma 
2 in [3| that contains explicit parameter choices. 

Algorithm 3.3. Estimate Individual Fourier Coefficients 
Input: signal S , success probability 1 — 5, and accuracy factor e. 

1. Randomly sample from signal S with indices tij: S(tij), i — 1, . . . , 2 log(l/<5), j = 
0,...,8/e 2 . 

2. Take the empirical mean of the (S(tij), tfi^itj)) , j = 0, . . . , 8/e 2 , store as mean{i). 

3. Take the median y = median(mean(i)) , i = 1, . . . ,21og(l/5). 
4- Return y. 

LEMMA 3.4. Every application of Alaorithm \'3.$ constructs a realization of a random variable 
Z, that estimates the Fourier coefficient S(to), good up to tolerance e 2 \\S\\ 2 with high probability 
1 — S, i.e., 

Prob (\Z - S(uj)\ 2 > e 2 ||5|| 2 ) < 6. (3.1) 
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Proof. Define a random vector V as follows: 

V = (0, 0, • • • , NS(t),0, • • • , 0) = NS t S(t). (3.2) 
where t is chosen uniformly and randomly from {I : I = 1, • ■ ■ , N}. Then the expectation of V is 

t 

Let X be the random variable X = (V, <j>J), where ^(t) = N~$ e - 2wluJt / N . We have 

E W = jj E nsWMt) = <§M> (3.4) 
t 

and 

2 -|^ll2- (3.5) 

Define another random vector W as the average of L independent realization of V, with L = 8e~ 2 . 
Let a random variable 

F = (W,^). (3.6) 

Then E[Y] = S(cu) and var[Y] = var[X}/L = e 2 ||S|| 2 /8, so that Prob (\Y - S(uj)\ 2 > e 2 ||5|| 2 ) < 
1/8. 

Set Z = median K Y, where K = 21og(l/6). If \Z - S(uj)\ 2 > e 2 \\S\\ 2 , then for at least half of the 
Ys, we have 

\Y-S(u;)\ 2 >e 2 \\S\\ 2 . (3.7) 

Therefore 

P(|Z-5H| 2 > £ 2 ||5|| 2 ) < J2 

3=K/2 

< %- K ' 2 2 K = 2- K ' 2 < S. (3.8) 

So with probability 1 — 6, Z is a good estimate of the Fourier Coefficient S(oj), good up to tolerance 
e 2 \\S\\ 2 . D 

Several observations and new techniques can speed up the coefficient estimation even further. 

One observation is that fewer samples are already able to give an estimation with desirable 
accuracy and probability. Our arguments indicate that 16e _2 | log(<$)| samples per coefficient suffice 
to obtain good approximations of the coefficients. The estimates used to obtain this bound are 
rather coarse, however. In a practical implementation, if a multi-step evaluation is used (see below), 
it turns out that three steps, in which every step uses 10 samples per mean, and 5 means per median, 
for a total of 150 samples (per coefficient) already determine the coefficient with accuracy e = 10 -4 . 
The major factor in this drastic reduction (from 16 • 10 s | log <5| to 150) is the much smaller number 
of means used; in practice, the dependence on e grows much slower than e -2 as e — > If the signal 
is contaminated by noise or has more than one significant mode, we need more samples for a good 
estimation of the same accuracy. 

An additional difference with the sampling described in [3] is that one can replace individual 
random samples by samples on short arithmetic progressions with random initial points. This 
technique became one of several components in the RA£SFA version of |3j that adapted the original 
algorithm in order to obtain linearity in B. For a description of the arithmetic progression sampling, 
we refer to 0]. Surprisingly, this change not only improves the speed, but also gives a closer 
approximation than simply random sampling, using the same number of samples. 



(i- 



E \X 



5H| 2 ) <E(\X\ 2 ) = ±J2 




Theoretical and Experimental Analysis of a Randomized Algorithm for Sparse Fourier Analysis 



11 



Another idea is a coarse-to-fine multi-step estimation of the coefficients. There are several 
reasons for not estimating coefficients with high accuracy in only one step. One of them is that 
increasing the accuracy e means a corresponding quadratic increase of the number of samples 
0(| \og8\e~ 2 ). A multi-step procedure, which produces only an approximate estimate of the coef- 
ficients in each step, achieves better accuracy and speed. To explain how this works, we need the 
following lemma. 

Lemma 3.5. Given a signal S, let io\, . . . ,u> q be q different frequencies, and define (3: = 
ll^lll — ^(^i)! 2 /Il^lli- Estimate the coefficients S(tUi) where i = 1, . . . , q by the following 

iterative algorithm: apply Algorithm 13.31 with precision e and probability of failure S; keep the 
parameters fixed throughout the iterative procedure, and let Z™ , i = 1, . . . , q, be the estimate (at the 
n-th iteration) of the uii-th Fourier coefficient of S — 53fe=i ^j^j- The total estimate R n 

after the n-th iteration is thus R n — ^j^w, ■ Then 

£ \s(^) - Rn(uj)\ 2 < Y^PWSW 2 + (i£ 2 ) n \\s\\\ (3.9) 

with probability exceeding (1 — 6) nq . 

Proof. (This is essentially a simplified version of proof for Lemma 10 in fJQ 
By Lemma T3.4I 

n-1 

\Zf + J2 Z i~ SMI 2 < e 2 HS - ^n-ill 2 , (3.10) 
with probability exceeding 1 — <5. It follows that 

q n 



£ - £z*| 2 < ge 2 ||S - i^_x]| 2 , (3.11) 



i=l fe=l 



so that 



||S-i?„|| 2 < J2 \S(u)\ 2 + qe 2 \\S-R n - 1 \\ 2 

u^{ui,...,w g } 

= \\S\\ 2 -^TlSi^)] 2 + qe 2 \\S - Rn^f, (3.12) 

i=l 

= /3||5|| 2 + (7e 2 ||5-i?„_ 1 || 2 
with probability exceeding (1 — 5) q . 

Consider now the sequence (a n ), defined by a n — (3\\S\\ 2 + qe 2 a n _i, where oq = \\S\\ 2 . It is easy to 
see that 

n-i 

a„ = /3||5|| 2 ^( g eY + ( 9 6 2 )"||5|| 2 (3.13) 

fc=0 

= P\\s\\ 2 l Y^j^ + (^) n \\s\\ 2 . 

It then follows by induction that \\S — -R n || 2 < a n , with probability exceeding (1 — 5) nq , for all n; 
we have thus 



1 - (qe 2 ^ n 
1 



||S - i?„|| 2 < P\\S\\ 2 l _lJ ) + {^ 2 ) n \\S\\ 2 (3.14) 



- /3||5||2 t - ^) + (^ 2 rii5ir, 
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or equivalently, 

J2 \§(<»i) - Rn(^)\ 2 = \\s- rj 2 - p\\s\\ 2 < PWSW'j^-z + (qt 2 T\\s\\ 2 , (3.15) 

with probability exceeding (1 — 8) qn . □ 

The above lemma shows that repeated rough estimation can be more efficient than a single 
accurate estimation. To make this clear, if we set 

qe 2 = (3-^ + (qe 2 r, (1 - 6)" = (1 - , (3.16) 

then a one-step procedure with parameters e, 5 will achieve the same precision as an n-step iterative 
procedure with parameters e, 5. The one-step procedure will use Cqe~ 2 \ log(J) | sampling steps; the 
iterative procedure will use Cnqi~ 2 \ log(<5)|. It follows that the n-step iterative procedure will be 
more efficient, i.e., obtain the same accuracy with the same probability while sampling fewer times, 
if 

ne- 2 |log(5)|<e- 2 |log(5)|, (3.17) 

under the constraints (|3.16|) . If (3 = (that is, if S is a pure g-component signal), then this condition 
reduces (under the assumption that 5, 6 and e, e are small, so that ^ p ~ qi 2 , (1 — 5) n ~ 1 — no) 
to 

ndlog^l+n)^ 2 )"- 1 < | log S\, (3.18) 

which is certainly satisfied if e is sufficiently small and n sufficiently large. If (3 ^ 0, matters are 
more complicated, but by a simple continuity argument we expect the condition still to be satisfied 
if (3 is sufficiently small. If (3 is too large, (e.g. if /3 > , where no is the minimum value of n for 
which (|3.18|) holds), then there are no choices of n, e, 5 that will satisfy l|3.16[l and l|3.17|l . On the 
other hand, (3 can be large only if S has important modes not included in u>i, . . . , u> q . In practice, 
we use the multi-step procedure after the most important modes have been identified so that (3 is 
small. For sufficiently small (3, we do gain by taking the iterative procedure. For example, assume 
that (3 = 1(T 2 , for a signal of type S = (pi + cf> 2 with N = 1000, q = B = 2, S = 2~ 7 , e = 4 ■ 10~ 4 , 
and with n = 3, theoretically we would then use 450,000 samplings for the one-step procedure, 
versus 150 samples for the iterative procedure. Note that we introduced the parameter (3 only for 
expository purposes. In practice, we simply continue with the process of identifying modes and 
roughly estimating their coefficients until our estimate of the residual signal is small; at that point, 
we switch to the above multi-step estimation procedure. 

3.3. Estimate Norms. The basic principle to locate the label of the significant frequency 
is to estimate the energy of the new signals obtained from isolation and group testing steps. The 
new signals are supported on only a small number of taps in the time domain and have 98% of 
their energies concentrated on one mode. The original analysis in only gave its loose theoretical 
bound. Here we find the empirical parameters, i.e., the number of samples for norm estimation. 

Here is a new scheme for estimating norms, which uses much fewer samples than the original 
one and still achieves good estimation. It can ultimately be used to find the significant mode in 
conjunction with Group Testing and MSB, below. 

Algorithm 3.6. Estimate Norms Input: signal S, failure probability 6. 

1. Initialize: the number of samples: r = [12.51n(l/<5)J . 

2. Take r independent random samples from the signal S: S(i\), . . . , S(i r ), where r is a mul- 
tiple of 5. 

3. Return Nx "60-th percentile of" IS^i)! 2 , . , |<S(ir)| 2 - 

The following lemma presents the theoretical analysis of this algorithm. 

Lemma 3.7. If a signal S is 93% pure, the number of samples r > 12.51n(l/(5), the output of 
Alaorithm XWM gives an estimation X of its energy which exceeds 0.3|j«S'|| 2 with probability exceeding 
1-5. 
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Proof. Without loss of generality, suppose that \\S\\ = 1. Suppose the signal S — afi^ + e, where 
| a. | 2 > 0.93||S'|| 2 , and 4> mega and e are orthogonal. We shall sample the signal S independently for 
r times, as stated in Algorithm 13. 61 Note that we do not impose that samples be taken at different 
time positions; with very small probability, the samples could coincide. Let T — {t : N\S(t)\ 2 < 
0.3||S|| 2 }. Hence, for any t € T, we have y/N\S(t)\ < V03 = 0.5477. Also by the purity of S, we 
have ||e|| 2 < 0.07. Since \S(t)\ > \a^{t)\ - \e(t)\, we obtain 



/ N\e(t)\> \a\-VN\S{t)\. (3.19) 
then for any t G T, 



ViV|e(t)| > %/093- V03. (3.20) 

Therefore, 

OWN > N\\e\\ 2 > N^\e{t)\ 2 > (V093 - \/(13) 2 |T|. (3.21) 



It follows that 



|T| < 0.403A^ (3.22) 

Let a — the above inequality becomes < a < 0.403. 
Consider now the characteristic function \t of the set T, 

f 1 if t G T , 

XT(t) = \ n . (3.23) 
I otherwise, 

and define the random variable Xt as xt(*), where i is picked randomly Then we have 

ITI 

E(X T ) = j+ < 0.403, (3.24) 

and 

E(e XTZ ) = e Q Prob( X T(i) = 0) + e z Prob{ XT {i) = 1) = 1 - a + ae z . (3.25) 

Suppose now we sample the signal S r times independently, and obtain <S(ti), . . . , S(t r ), where 
h, . ..,t r S [0, N}. Take the 60-th percentile of the numbers N\S(ti)\ 2 , . . . , N\S(t r )\ 2 . By Chernoff's 
standard argument, we have for z > 

Prob (60-th percentile < 0.3||S'|| 2 ) = Prob (0.6r of the samples' t belong to T) 

= Prob( XT (ti) + ■■■ + xAU) > 0.6r) 
< e -°- 6 '' 2 £:(e^^=i XT( ^ ) ) 

= [(l-a)e- afo +ae a4z ] r . (3.26) 

Take z = ln(1.5(l - a) /a), then 

(1 - a)e-°- 6z + ae a4z = 1.96a a6 (l - af A . (3.27) 

The right hand side of l|3.27|l is increasing in a on the interval [0, 0.403]; since a < 0.403, we obtain 
an upper bound by substituting 0.403 for a: 

[(1 - a)e~°- 6z + ae 0Az ] r = [l.96a 6 (l - a) QA ] r < e -°- 08r . (3.28) 

So for r > 12.5 ln(l/(5), we have 

Pro6(Output of Algorithm3. 6 > 0.3||S'|| 2 ) = Pro6(60-th percentile of N\S(t)\ 2 > 0.3||S'|| 2 ) (3.29) 

>1-S. 



14 



J. ZOU, A. GILBERT, M. STRAUSS, I. DAUBECHIES 



□ 

In practice, we often generate signals that are not so pure and thus need more samples for norm 
estimation. Although the estimation is sometimes pretty far away from the true value, it gives a 
rough idea of where the significant mode might be. When we desire more accuracy, a smaller 
constant C in the number of samples Clog(l/<5) is chosen. In the statement of the algorithm, we 
choose r to be a multiple of 5, so that the 60-th percentile would be well-defined. In practice, it 
works equally well to take r that are not multiples of 5 and to round down, taking the L3r/5j-th 
sample in an increasingly ordered set of samples. 

We shall also need an upper bound on the outcome of Algorithm 13.61 which should hold 
regardless of whether the signal S is highly pure or not. This is provided by the next lemma, which 
proves that for general signals, Algorithm 13.61 produces an estimation of the energy, that is less 
than 2||S'|| 2 with high probability. 

Lemma 3.8. Suppose Alaorithm \H.b\ generates an estimation X for \\S\\ 2 , then 

, , v 0.1441n(l/(5) 

Prob(X > 2\\S\\ 2 ) <( -J = 8 - 1 . (3.30) 



Proof. Suppose r independent random samples are S(t\), Sfa), . . . , S(t r ), then 

Prob{N\S{U)\ 2 > 2\\S\\ 2 ) < ^i|,^ )|2) = 1/2. (3.31) 



Since X is the 60-th percentile of the sequence NS(t\), . . . , NS(t r ), with r = 0.36 ln(l/ <5), 

/ 1 \ 0.1441n(l/,5) 

Prob(X > 2\\S\\ 2 ) < (Prob(N\S(U)\ 2 > 2||S|| 2 ))°- 1441n(1/5) < I ±J = S 01 . (3.32) 

□ 

3.4. Isolation. Isolation processes a signal S and returns a new signal with significant fre- 
quency u>, with 98% of the energy concentrated on this mode. A frequency u> is called "significant" 
for S , if IS^cj)! > 7y||5|| 2 , where r\ is a threshold, fixed by the implementation, which may be fairly 
small. More precisely, the isolation step returns a series of signals i*b, . . . , F r , such that, with 
high probability, \Fj(uj)\ 2 > 0.98\\Fj\\ 2 for some j, that is, at least one of the F ,Fi : . . . ,F r is 98% 
pure. 

Typically, not all of the FiS are pure. We shall nevertheless apply the further steps of the 
algorithm to each of the i^s, since we don't know which one is pure. An impure Fi may lead to a 
meaningless value for the putative mode uji located in Fi. This is detected by the computation of 
the corresponding coefficients: only when the coefficient corresponding to a mode is significant do 
we output the mode and its coefficient. Some impure signals might output an insignificant mode. 
Hence, we estimate and compare their coefficients to check the significance of the modes. Finally, 
we only output the modes with significant coefficients. 

The discussion in 3 proposes a B-tap box-car filter in the time domain, which corresponds 
to a Dirichlet filter with width ^ in the frequency domain. The whole frequency region would be 
covered by random dilation and translations of this filter. 

Notation: as in [TI], we define a box-car filter Hk as Hk(t) = 2fc+i X[-k.k] > where fceN. 

Lemma 3.9. 

1. For all k, 

- . N 1 A -2xic t sin(7r(2fc + lWiV) 
H ^ = 2fcTT t 5 fe ^ = (2 fc Yl)sm(l/A0 - ^ 

2. Notation: Hkj(t) — e 27ri: ' t ^ 2k+1 " > Hk(t) in the time domain, which is equivalent to a shift 
of Hk{oj) by jN/(2k + 1) in the frequency domain. 
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3. Notation: Define R e ,aS{t) by R , a S{t) = e -^iet/aN F{t/a), so that R^S = S(aw + 9)., 
where Rs,a *s a dilation and shift operator in the frequency domain. 
More detailed description of the Box-car filter can be found in 0. 

The isolation procedure in randomly permutes the signal S and then convolves it with a 
shifted version of Hkj to get a series of new signals Fj = H^j * Re,aS, where j — 0, ... ,2k. 
This scheme does not work well in practice. In the new version of the isolation steps, each Fj = 
Hk * Rgj^jS corresponds to different randomly generated dilation and modulation factors, with 
j = 0, . . . , log(l/<5), the parameters Uj and N are relatively prime. These factors are taken at 
random between and N — 1. The following lemma is similar to Lemma 8 in [3] for the new 
isolation step, with more explicit values of the parameters. 

Lemma 3.10. [3] Let a signal S and a number r\ be given, and create log(l/(S) new signals: 
F , . . .,F log{1/s) with Fj = H k * Re^cTjS, where j = 0, . . . ,log(l/<5). If k > 12.25(1 - rj^/r/ , then 
for each lo such that \S{u>)\ 2 > 7y||S'|| 2 , there exists some j such that with high probability 1 — 5, the 
new signal Fj is 98% pure. 



Proof. Suppose tr - (w — 9j) falls into the pass region of the Hk filter, i.e., that \<j, 1 {u 



< 



N 

2(2fc+l) 



We know that 



H k (aj\u-0 j )) >2/tt, 



(3.34) 



so that 



Fj (oj\u-ej)) 2 > (2/tt) 2 S(u>) 2 > (2M 2 V \\S\\ 2 . 



(3.35) 



greater than the average value, l/(2k + 1), of \H k \ 2 . Since \Hk(o-j (u> — 0j))\ 2 is greater than the 
average value of Hk , we have 



y , 



W^iJj (u)-6j) 



\H k {uo 



l\\2 



N -1 



< 



N 2k + 1 



(3.36) 



Moreover, E w »^ l^')| 2 < (1 ~ ??)II<S|| 2 . In particular, \S{u')\ 2 < (1 - r/)||S*|| 2 if a/ ^ w. We then 
have 



E 



±N/(2k + 1) < trj\w - 8j) < ±N/(2k + 1) 



Define X to be the random variable 



A = { £ I^V)! 2 

LO'^jtoJ 1 (U>— 6j) 



- ±N/(2k + 1) < aj\u - 0j) < ±N/(2k + 1) 



(3.38) 



For this random variable, we have 
/ x 

Prob 



> 1/49 J = Prob (X > {FjiaJ 1 ^ - 8 3 ))\ 2 /49 



< 



E{X) 



49(1 - fy)?!- 2 

\Fjiaj\cJ ejW/49 ~ 4r7(2fc + l) 



< 



(3.39) 



Since k > 12.25(1 — r\)-K 2 jr\, the right hand side of (4.37) is < 1/2, meaning that the signal Fj 
is 98% pure with probability > 1/2. The success probability, i.e., the probability of obtaining at 
least one Fj that is 98% pure, can be boosted from ~ to probability 1 — 5 by repeating 0(log(l/<5)) 
times, i.e., generating 0(log(l/<5)) signals. □ 
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The above lemma gives a lower bound for the filter width. Obviously, the larger the width 
in the time domain, the higher the probability that the frequency will be successfully isolated. 
However, a larger width leads to more evaluations of the function and therefore more time for each 
isolation step. One needs to balance carefully between the computational time for each iteration 
step and the total number of iterations. 

Based on several numerical experiments, we found that a very narrow filter is preferable and 
gives good performance; for instance, the filter with three-tap width, i.e., k = 1 works best for a 
signal with 2 modes. For the choice k = 4, the algorithm ends after fewer iterations; however, each 
iteration takes much more time. The choice of a 9-tap width filter makes the code four times slower 
in total. 

The filter width is weakly determined by the number of modes in the signal, not by the length 
of the signal. Through experimentation, we found that when the number of modes is less than 8, 
the 3-tap width filter works very well; as the number of modes increases, larger width filters are 
better. Numerical experiments suggests a sublinear relationship between the width of the filter and 
the number of modes; in our experiments a 5-tap filter still sufficed for B = 64. 

3.5. Group Testing. After the isolation returns several signals, at least one of which is 98% 
pure with high probability, group testing aims at finding the most significant mode for each. We 
use a procedure called Most Significant Bit (MSB) to approach the mode recursively. 

In each MSB step, we use a Box-car filter Hu to subdivide the whole region into 2k + 1 
subregions. By estimating the energies and comparing the estimates for all these new signals, we 
find the one with maximum energy, and we exclude those that have estimated energies much smaller 
than this maximum energy. We then repeat on the remaining region, a more precisely on the region 
obtained by removing the largest chain of excluded intervals; we dilate so that this new region fills 
the whole original interval, and split again. The successive outputs of the retained region gives an 
increasingly good approximation to the dominant frequencies. The following are the Group testing 
procedures: 

Algorithm 3.11. Group Testing 
Input: signal F, the length N of the signal F. 

Initialize: set the signal F to F , iterative step i = 0, the length N of the signal, the accumulation 

factor q = 1. 

In the ith iteration, 

1- Ifq> A, then return 0. 

2. Find the most significant bit v and the number of significant intervals c by the procedure 
MSB. 

3. Update i = i + l, modulate the signal Fi by and dilate it by a factor of 4(2fc + l)/c. 
Store it in -Fj+i . 

4- Call the Group testing again with the new signal Fi, store its result in g. 

5. Update the accumulation factor q = q* 4(2fc + l)/c. 

6. Ifg> N/2, then g = g-N. 

7. return modd^^ + +0.5j,A); 
The MSB procedure is as follows. 

Algorithm 3.12. MSB (Most Significant Bit) 
Input: signal F with length N , a threshold < n < 1 . 

1. Get a series of new signals Gj(t) = F(t) * ( e 2™jt/4(2fc+i) Hk ^ j = 0, . . . , 8fc + 4. That is, 

each signal Gj concentrates on the pass region [ ^^k+i) » ^4(2fc+i)^ ] '■= P ass j- 

2. Estimate the energies ej of Gj, j = 0, . . . , 8k + 4. 

3. Let I be the index for the signal with the maximum energy. 

4- Compare the energies of all other signals with the Ith signal. If a < nei , label it as an 
interval with small energy. 

5. Take the center v s of the longest chain of consecutive small energy intervals, suppose there 
are c s intervals altogether in this chain. 

6. The center of the large energy intervals is v = 4(2fc + 1) — v s , the number of intervals with 
large energy is c = 4(2fc + 1) — c s . 
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7. If c > 4(2fc + l)/2, then do the original MSB 3 to get v and set c — 2, and v = 
center of the interval with maximal energy. 

8. Output the dilation factor c and the most significant bit v. 

Lemma 3.13. Given a signal F with 98% purity, suppose Gj(t) — F * e 2 ™ jt / i{2k+ ^ H k {t). If 
k>2, then Alaorithm \[\.\l\ can find the significant frequency lo of the signal F with high probability. 

Proof. Suppose the filter width of Hk is 2k + 1. Observe that, for some j, < j < 4(2fc + 1), 
u> G passj. Without loss of generality, assume j = 0. Now consider the signal Go- Since u> € passo, 
the Fourier coefficient Gq{lo) satisfies 

^'"»^( (2 t+1) ;;:waU + i)) ) 2|f( " )|2 <mo) 

> 0.9744 2 • 0.98||F|| 2 « 0.93H.FII 2 . 

for all k > 0. It follows from Lemma T3. 71 that the output of Algorithm 13. 61 applies to Go, estimate 
that is at least 

0310,1- > 0.3|d„ M |* > 0.3-0.98 ( (2t + !„ )' II'- (MD 

On the other hand, now consider G5. Note that 

|c„M| - IFMIlffiHI < {2k + 1) , m{ l /m+1)) \FM\ ("2) 

< II-FII 

- (2fc + l)sin(97r/8(2A; + l)) 11 11 

Also, ||G 5 || 2 - |G 5 (w)| 2 < 0.02||^|| 2 , because F is 98% pure. Thus 

||G 5 f < |fl. M |> + O02||Ff = (( |a+1)sm(9 1 j/8(2t + 1)) ) 2 + 0.02) IIFf . (3.43) 

By Lemma f3. 81 if we use Algorithm 13. 61 the estimation result for G5 will be at most 2 1 1 C5 1 1 2 with 
high probability. It is easy to show that the inequality 

294 ^ S[n{7r/8) V > 2 T - V+0 04 (3 44) 

\{2k+ l)sin(7r/8(2/c+ 1))/ " \(2k + l)sin(9n/8(2k + 1)) J y ' ' 

holds for all k > 0. The same argument applies to Gj with 5 < j < 4(2fc + 1) — 5. It follows that, 
with high probability, the result of applying Algorithm l3.6l to Go will give a result that exceeds the 
result obtained by applying Algorithm 13.61 to Gj with 5 < j < A{2k + 1) — 5. 

In general, if the pass region is at some jo, we can compare ||Gj || 2 with ||Gj|| 2 for all \j — jo\ > 5. 
If there is some jo for which the estimation of HGjJj 2 is apparently larger than 1 1 G j 1 1 2 , then we 
conclude u) ^ passj] otherwise, possibly lo € passj. By the above argument, we can eliminate 
4(2fc + 1) — 9 consecutive pass regions out of the 4(2fc + 1), leaving a cyclic interval of length at 
most 4 ( 2 9 fc+i) ■ I 11 order for the residual region to be smaller or equal to half of the whole region, we 
need 4(2fc + 1) > 18, which is equivalent to the condition k > 2. 

In the recursive steps, let P denote a cyclic interval with size at most ^k+i) that includes all 

the possibilities for lo. Let v denote its center. Then generate a new signal F\(t) — e - 2j!lvt / N F(t); 
this is a shift of the spectrum of F by —v. Thus the frequency lo — v is the biggest frequency of 

4.5JV tri 4.5JV 
' 4(2fe+l)) lU _l "4(2fc+l)) ' 



Fi(t)., which is in the range of — ^ftf+i)) to + 4(2fe+i)) • ^ e wm now see ^ w — v - 

Since we rule out a fraction of ^fc^ij length of the whole region, we may dilate the remainder 
by [4(2fc + 1)/9J , which can be accomplished in the time domain by dilating F\ by ^ 2 k+i) ■ Thus 
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the interval of length just less than ^k+i) known to contain u — v is dilated to the alternate 
positions in an interval of length just less than N. We then rule out again ^k+i) °^ trns dilated 
frequency domain, leaving a remainder of length at most ^ 2 k+i) l en gth. Then we undo the dilation, 
getting an interval of length just less than ( 4 ( 2 9 fc+i)) > centered at some v 2 , which is the second most 
significant bit of oj in a number base [ 4 ( 2 ^ +1 ) j . We would repeat this process to get the other bits 
of uj. By getting a series of Vi, . . . , v \\og M:2k+l)/ g ivj+i> we can recover the u. □ 

In fact, a narrower filter with a larger shift width than 4 ^ 2 k+i) wor ks nne and makes the 
algorithm faster in practice. Heuristically, we find that the optimal number of taps for small B 
cases is 3. Suppose the MSB filter width is 3 and each MSB rules out 2 intervals out of 3, then 
the total number of recursive group test is log 3 N. Then the computational cost is 3 log 3 N norm 
estimations and 2 log 3 N comparisons. Numerical experiments suggests that k is probably linear in 
logB. The shift width we use in practice is ^TT- 

We find that the output of group testing in both the original and the present version of RA£SFA 
might differ from the true mode by one place. We suspect that the reason is that all the float oper- 
ations and the conversion to integers introduce and accumulate some error into the final frequency. 
As a solution, the coefficients of nearby neighbors are also estimated roughly to determine the true 
significant modes. 

3.6. Sample from a transformed signal. A key issue in the implementation consists of 
obtaining information (by sampling) from a signal after it has been dilated, modulated, or even 
convolved. We briefly discuss here how to carry out this sampling in discrete signals. 

First, we consider a dilated and modulated signal, for example, in the isolation procedure which 
uses R ,aS{t) = e- 27Tm / aN S(t/a), which is equivalent to (R^S)(u) = S(au> + 6) in the frequency 
domain. Here a and 9 are chosen uniformly and randomly, from to N — 1 for 9, and from 1 to 
N - 1 for a. The sample R e ,„F{t), where t e {0, 1, . . . , N - l},is then e- 2nm / aN (Re, a )F(t) = 
e- 2met / aN F(cr*t), where a* is chosen so that a* a = l(modN). If N is prime, then we can always 
find (a unique value for) a* for arbitrary a; if N is not prime, a* may fail to exist for some choices 
of a. Our program uses the Euclidean algorithm to determine a*; when N is not prime and a and 
N are not co-prime, the resulting candidates for <r* arc not correct and may lead to estimates for 
the modes that are incorrect; these mistakes are detected automatically by the algorithm when it 
estimates the corresponding coefficient and finds it to be below threshold. 

We also need to sample from convolved signals, e.g. S * Hk(t). Because Hk has only 2k + 1 
taps, only 2k + 1 points contribute to the calculation of the convolution. Since S * Hk(t) = 
J2i=-k Hk{i)S(t — i), we need only the values S(t — i), i = —k, . . . , k, all of which we sample. 

3.7. Extension to a Higher Dimensional Signal. The original RA^SFA discusses only the 
one dimensional case. As explained earlier, it is of particular interest to extend RA^SFA to higher 
dimensional cases because there its advantage over the FFT is more pronounced. 

In d dimensions, the Fourier basis function is 

<fo(*) ^^...^(zi,...,^^-^^^ (3.45) 
the representation of a signal is 

N 

S(x!, ...,Xd) = 'y]c i <l>ui ll ,...,u i , d - (3-46) 

i=l 

Suppose the dimension of the signal is d, denote x = (x\,X2, ■ ■ ■ , Xd), C3 = (oji, . . . , u>d)- 

The total scheme remains much the same as in one dimension: 

Algorithm 3.14. Total Scheme in d dimensions 
Input: signal S, the number of nonzero modes B or its upper bound, accuracy factor e, success 
probability 1 — 8, an upper bound of the signal energy M, the standard deviation of the white 
Gaussian noise a. 
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1. Initialize the representation signal R to 0, set the maximum number of iterations T = 
Blog(N)log(S)/e 2 ,. 

2. Test whether \\S — R\\ 2 < t||i?|| 2 . If yes, return the representation signal R and the whole 
algorithm ends; else go to step 3. 

3. Locate Fourier Modes uj for the signal S — R by the new isolation and group test procedures. 
4- Estimate Fourier Coefficients at uj: (S — R)(uj). 

5. If the total number of iterations is less than T , go to 2; else return the representation R. 

The most important modification with respect to the one dimensional case lies in the procedure 
to carry out step 3 of Algorithm 13.141 We adapt the technique for frequency identification to fit 
the high dimensional case; it is given by the following procedure. 

Algorithm 3.15. Locate the Fourier mode in d dimensions Input: signal S, accuracy 
factor e, success probability 1 — 5, an upper bound of the signal energy M . 

1. Random permutations in d dimension. 

2. Isolate in one (arbitrarily picked) dimension i to get a new signal F(t) = S * Hk(t). 

3. For each dimension i' , find the i'th component uj*, of the significant frequency by Group 
Testing for the signal F in the i'th dimension. 

4- Finally, estimate the Fourier coefficients in the frequency to = (wq^.^w^J. Keep the 
frequency d-tuple if its Fourier coefficient is large. 

Note that the computational cost of the above algorithm is quadratic in the number of di- 
mensions. The permutation involves a d x d matrix 1 The group test procedure in each dimension 
processes the same isolation signal. If a filter with B taps is used for the isolation, then it captures 
at least one significant frequency in the pass region with probability 1/B. The basic idea behind this 
procedure is that, because of the sparseness of the Fourier representation, cutting the frequency do- 
main into slices of width 1/B in 1 dimension, leaving the other dimensions untouched, leads to, with 
positive probability, a separation of the important modes into different slices. After this essentially 
1-dimensional isolation, we only need to identify the coordinates of the isolated frequency mode. Af- 
ter isolation, we assume F(x) = Ae Ma ^ N , where A and uj are unknown. To find u)y , we sample in 
the j'-th coordinate only, keeping x\, . . . , Xji—i,Xj>+i, . . . , x& fixed, so that (for this step) F{x) can 
be viewed as Ae 2 ™^ 3 ^ = Ae 2 ™ u i' x i'/ N , where A = Ae 27T ^ XlUl+ - +x ^-^ j '^ 1+x ^'+^' +1+ - +XdUJd \ 
remains the same for different Xji and has the same absolute value as A, which we can do in each 
dimension separately by the following argument. 

If we just repeated the 1-dimensional technique in each dimension, that is, carried out isolation 
in each of the d dimensions sequentially, the time cost would be exponential in the dimension d. 
We discuss now in some detail the steps 1, 2, 3 of Algorithm 13. 151 

3.7.1. Random Permutations. In one dimensional RA£SFA, the isolation part includes 
random permutations and the construction of signals with one frequency dominant. However, the 
situation is more complicated in higher dimensions, which is why we separated out the permutation 
step in the algorithm. 

Recall that in one dimension, the signal is dilated and modulated randomly in order to sep- 
arate possibly neighboring frequencies. In higher dimensions, different modes can have identical 
coordinates in some of the dimensions; they would continue to coincide in these dimensions if we 
just applied "diagonal" dilation, i.e., if we carried out dilation and modulation sequentially in the 
different dimensions. To separate such modes, we need to use random matrices. We transform any 
point (xi,a;2, ... , x d ) into (3/1, ... , yd) given by 




an a\2 ■ ■ ■ aid 



ddi a d 2 ■■■ a d d J \ x d J \ b d 



(3.47) 



Note that generalizing to d dimensions our 1-dimensional practice of checking not only the central frequency 
found, but also nearby neighbors, would make this algorithm exponential in d, which is acceptable for small d. For 
large d, we expect it would suffice to check a fixed number of randomly picked nearby neighbors, removing the 
exponential nature of this technical feature. 
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where A = (ay) is a random and invertible matrix, the ay and the bi are chosen randomly, uniformly 
and independently, and the arithmetic is modulo N. For example, if d — 2, N — 7, an = l,ai2 = 
3, a2i = 5, a22 = 2, hi = 0, 62 = 5, that is, 




(3.48) 



the point (1,2) gets mapped to (0,0), (1,3) to (3,2), and (0,3) to (2,4): even though points (1,2) 
and (1,3) have the same first coordinate, their images don't share a coordinate; the same happens 
with points (1, 3) and (0, 3). For each dimension i' , the i'th components of frequencies are mapped 
by pairwise independent permutations. Even adjacent points that differ in only one coordinate are 
destined to be separate with high probability after these random permutations. 

3.7.2. Isolation. After the random permutations, the high dimensional version of isolation 
can construct a sequence Fq, Fi, . . . of signals, such that , for some j, \Fj(u )| 2 > 0.98||F|| 2 . 

Algorithm 3.16. High Dimensional Isolation 
Choose an arbitrary dimension i. 

1. Filter on the dimension i and leave all other dimensions alone, get the signal 

F = S*H k , (3.49) 

where Hk — gCT X[— fc.fc] filters on the dimension i; the other dimensions are not affected. 

2. Output new signals F to be used in the Group Testing. 

3.7.3. Group Testing for Each Dimension. After the random permutation and isolation, 
we expect a d-dimensional signal with most of its energy concentrated on one mode. The isolation 
step effectively separates the d-dimensional frequency domain in a number of d-dimensional slices. 
Group testing has to subdivide these slices. 

One naive method is to apply d dimensional filters in group testing, concentrating on d- 
dimensional cubic subregions in group testing that cover the whole area. However, this leads 
to more cost. If the number of taps of this filter in one dimension is 2k + 1, we obtain (2k + l) d sub- 
regions. Estimating the energies of all subregions slows down the total running time. Consequently 
we instead locate each component of the significant frequency label separately. That is, we only use 
a filter to focus on one dimension and leave other dimensions alone. The energy of 2k + 1 regions 
are computed in every dimension. Hence, we need to estimate the norm of d(2fc + 1) intervals in 
total. This makes Group Testing linear in the number of dimensions, instead of exponential as in 
the naive method. 

Here is the procedure in Group Test: 

Algorithm 3.17. High Dimensional Group Test 
For i' = l,...,d 

1. Construct signals G ( f ] = F(t) * ( e 2 «i t .'/( 2Z + 1 ) J ff,), j = 1, . . . ,21 + 1, where Hi filters on 
i'th dimension and leave all other dimensions alone; 

2. Estimate and compare the energy of each , j — 1, . . . , 21 + 1, use the similar procedure 
in one dimensional group testing procedure. Find the candidates u*, . 

The reader may wonder how sampling works out for this d-dimensional algorithm. In Algorithm 
13.171 we will need to sample G^ ' (which is the convolution of the (permuted version of) signal S 
with 2 filters) to estimate its energy; because filtering is done only in the i'-th dimension, we shall 

sample G^ (x\, . . . , Xi>-\, xy, av+i, ■ ■ ■ , Xd) for different Xy , but keeping the other Xj fixed, where 
j =/= i' . The signal F itself comes from the Isolation step, in which we filter in direction i, for which 
S needs to be sampled, in this dimension only. Together, for each choices i' in Algorithm 13 . 161 and 
13.171 this implies we have (2k + 1) x (21 + 1) different samples of (the permuted version of) S, in 
which all but the ith coordinates of the samples x are identical. 

4. Conclusion. We provide both theoretical and experimental evidence to support the ad- 
vantage of the implementation of RA^SFA proposed here over the original one sketched in 
Moreover, we extend RA^SFA to high dimensional cases. For functions with few, dominant Fourier 
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modes, RA^SFA outperforms the FFT as N increases. We expect that RA^SFA will be useful as 
a substitute for the FFT in potential applications that require processing such sparse signals or 
computing B-term approximations. This paper is just the beginning of a series of our papers and 
researches, many of which are in preparation. For example, the strong dependence of running time 
on the number of modes B will be further lessened, and thus the algorithm would work for more 
interesting signals 4 . Also, the application of RAl?SFA in multiscale problems will be discussed in 

m 
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